NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Text to Blind Motion

Kim, Hee Jae; Sengupta, Kathakoli; Kuribayashi, Masaki; Kacorri, Hernisa; Ohn-Bar, Eshed (December 2024, Advances in Neural Information Processing Systems)

Full Text Available
Unified Local-Cloud Decision-Making via Reinforcement Learning

https://doi.org/10.1007/978-3-031-72940-9_11

Sengupta, Kathakoli; Shangguan, Zhongkai; Bharadwaj, Sandesh; Arora, Sanjay; Ohn-Bar, Eshed; Mancuso, Renato (November 2024, Springer Nature Switzerland)

Full Text Available
Learning to Drive Anywhere

Zhu, Ruizhao; Huang, Peng; Ohn-Bar, Eshed; Saligrama, Venkatesh (November 2023, Conference on Robot Learning)

Human drivers can seamlessly adapt their driving decisions across geographical locations with diverse conditions and rules of the road, e.g., left vs. right-hand traffic. In contrast, existing models for autonomous driving have been thus far only deployed within restricted operational domains, i.e., without accounting for varying driving behaviors across locations or model scalability. In this work, we propose AnyD, a single geographically-aware conditional imitation learning (CIL) model that can efficiently learn from heterogeneous and globally distributed data with dynamic environmental, traffic, and social characteristics. Our key insight is to introduce a high-capacity geo-location-based channel attention mechanism that effectively adapts to local nuances while also flexibly modeling similarities among regions in a data-driven manner. By optimizing a contrastive imitation objective, our proposed approach can efficiently scale across the inherently imbalanced data distributions and location-dependent events. We demonstrate the benefits of our AnyD agent across multiple datasets, cities, and scalable deployment paradigms, i.e., centralized, semi-supervised, and distributed agent training. Specifically, AnyD outperforms CIL baselines by over 14% in open-loop evaluation and 30% in closed-loop testing on CARLA.
more » « less
Full Text Available
Generalized Visual Odometry via Cross-Modal Self-Training

Lai, Lei; Shangguan, Zhongkai; Zhang, Jimuyang; Ohn-Bar, Eshed (October 2023, International Conference on Computer Vision)

We propose XVO, a semi-supervised learning method for training generalized monocular Visual Odometry (VO) models with robust off-the-self operation across diverse datasets and settings. In contrast to standard monocular VO approaches which often study a known calibration within a single dataset, XVO efficiently learns to recover relative pose with real-world scale from visual scene semantics, i.e., without relying on any known camera parameters. We optimize the motion estimation model via self-training from large amounts of unconstrained and heterogeneous dash camera videos available on YouTube. Our key contribution is twofold. First, we empirically demonstrate the benefits of semi-supervised training for learning a general-purpose direct VO regression network. Second, we demonstrate multi-modal supervision, including segmentation, flow, depth, and audio auxiliary prediction tasks, to facilitate generalized representations for the VO task. Specifically, we find audio prediction task to significantly enhance the semi-supervised learning process while alleviating noisy pseudo-labels, particularly in highly dynamic and out-of-domain video data. Our proposed teacher network achieves state-of-the-art performance on the commonly used KITTI benchmark despite no multi-frame optimization or knowledge of camera parameters. Combined with the proposed semi-supervised step, XVO demonstrates off-the-shelf knowledge transfer across diverse conditions on KITTI, nuScenes, and Argoverse without fine-tuning.
more » « less
Full Text Available
Forecasting Time-to-Collision from Monocular Video: Feasibility, Dataset, and Challenges

https://doi.org/10.1109/IROS40897.2019.8967730

Manglik, Aashi; Weng, Xinshuo; Ohn-Bar, Eshed; Kitanil, Kris M. (November 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS))

We explore the possibility of using a single monocular camera to forecast the time to collision between a suitcase-shaped robot being pushed by its user and other nearby pedestrians. We develop a purely image-based deep learning approach that directly estimates the time to collision without the need of relying on explicit geometric depth estimates or velocity information to predict future collisions. While previous work has focused on detecting immediate collision in the context of navigating Unmanned Aerial Vehicles, the detection was limited to a binary variable (i.e., collision or no collision). We propose a more fine-grained approach to collision forecasting by predicting the exact time to collision in terms of milliseconds, which is more helpful for collision avoidance in the context of dynamic path planning. To evaluate our method, we have collected a novel dataset of over 13,000 indoor video segments each showing a trajectory of at least one person ending in a close proximity (a near collision) with the camera mounted on a mobile suitcase-shaped platform. Using this dataset, we do extensive experimentation on different temporal windows as input using an exhaustive list of state-of-the-art convolutional neural networks (CNNs). Our results show that our proposed multi-stream CNN is the best model for predicting time to near-collision. The average prediction error of our time to near-collision is 0.75 seconds across the test videos. The project webpage can be found at https://aashi7.github.io/NearCollision.html.
more » « less
Full Text Available
Virtual navigation for blind people: Transferring route knowledge to the real-World

https://doi.org/10.1016/j.ijhcs.2019.102369

Guerreiro, João; Sato, Daisuke; Ahmetovic, Dragan; Ohn-Bar, Eshed; Kitani, Kris M.; Asakawa, Chieko (March 2020, International Journal of Human-Computer Studies)

Full Text Available
Future Near-Collision Prediction from Monocular Video: Feasibility, Dataset, and Challenges

MANGLIK, AASHI; WENG, XINSHUO; OHN-BAR, ESHED; KITANI, KRIS (April 2019, IEEE/RSJ International Conference on Intelligent Robots and Systems)

We explore the possibility of using a single monocular camera to forecast the time to collision between a suitcase-shaped robot being pushed by its user and other nearby pedestrians. We develop a purely image-based deep learning approach that directly estimates the time to collision without the need of relying on explicit geometric depth estimates or velocity information to predict future collisions. While previous work has focused on detecting immediate collision in the context of navigating Unmanned Aerial Vehicles, the detection was limited to a binary variable (i.e., collision or no collision). We propose a more fine-grained approach to collision forecasting by predicting the exact time to collision in terms of milliseconds, which is more helpful for collision avoidance in the context of dynamic path planning. To evaluate our method, we have collected a novel large-scale dataset of over 13,000 indoor video segments each showing a trajectory of at least one person ending in a close proximity (a near collision) with the camera mounted on a mobile suitcase-shaped platform. Using this dataset, we do extensive experimentation on different temporal windows as input using an exhaustive list of state-of-the-art convolutional neural networks (CNNs). Our results show that our proposed multi-stream CNN is the best model for predicting time to near-collision. The average prediction error of our time to near collision is 0.75 seconds across our test environments.
more » « less
Full Text Available
A-EXP4: Online Social Policy Learning for Adaptive Robot-Pedestrian Interaction

https://doi.org/10.1109/IROS40897.2019.8967737

Jin, Pengju; Ohn-Bar, Eshed; Kitani, Kris; Asakawa, Chieko (January 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS))

We study self-supervised adaptation of a robot's policy for social interaction, i.e., a policy for active communication with surrounding pedestrians through audio or visual signals. Inspired by the observation that humans continually adapt their behavior when interacting under varying social context, we propose Adaptive EXP4 (A-EXP4), a novel online learning algorithm for adapting the robot-pedestrian interaction policy. To address limitations of bandit algorithms in adaptation to unseen and highly dynamic scenarios, we employ a mixture model over the policy parameter space. Specifically, a Dirichlet Process Gaussian Mixture Model (DPMM) is used to cluster the parameters of sampled policies and maintain a mixture model over the clusters, hence effectively discovering policies that are suitable to the current environmental context in an unsupervised manner. Our simulated and real-world experiments demonstrate the feasibility of A-EXP4 in accommodating interaction with different types of pedestrians while jointly minimizing social disruption through the adaptation process. While the A-EXP4 formulation is kept general for application in a variety of domains requiring continual adaptation of a robot's policy, we specifically evaluate the performance of our algorithm using a suitcase-inspired assistive robotic platform. In this concrete assistive scenario, the algorithm observes how audio signals produced by the navigational system affect the behavior of pedestrians and adapts accordingly. Consequently, we find A-EXP4 to effectively adapt the interaction policy for gently clearing a navigation path in crowded settings, resulting in significant reduction in empirical regret compared to the EXP4 baseline.
more » « less
Full Text Available
Variability in Reactions to Instructional Guidance during Smartphone-Based Assisted Navigation of Blind Users

https://doi.org/10.1145/3264941

Ohn-Bar, Eshed; Guerreiro, João; Kitani, Kris; Asakawa, Chieko (September 2018, Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies)

Full Text Available
SmartPartNet: Part-Informed Person Detection for Body-Worn Smartphones

https://doi.org/10.1109/WACV.2018.00126

Yu, Heng; Ohn-Bar, Eshed; Yoo, Donghyun; Kitani, Kris M. (March 2018, Winter Conference on Applications of Computer Vision)

Full Text Available

« Prev Next »

Search for: All records